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Abstract 



One of the features inherent in nested Archimedean copulas, also called hier- 
■^^ archical Archimedean copulas, is their rooted tree structure. In this paper, a 

^^ nonparametric, rank-based method to estimate this structure is developed. Our 

[j^ approach consists in representing the rooted tree structure as a set of trivariate 

^H structures that can be estimated individually. Indeed, for any triple of variables 

^5 there are only four possible rooted tree structures and, based on a sample, a 

choice can be made by performing comparisons between the three bivariate mar- 
gins of the empirical distribution of the triple. The set of estimated trivariate 
structures can then be used to build an estimate of the global rooted tree struc- 
ture. This approach has the advantage that no assumptions about the nested 
~~^ Archimedean copula is required prior to the estimation of its structure. 

> 

■^ Keywords: Archimedean copula, dependence, nested Archimedean copula, 

hierarchical Archimedean copula, rooted tree, Kendall distribution, 
nonparametric inference 



o 

C^ 1. Introduction 



Archimedean copulas have become a popular tool for modeling or simulating 
bivariate data. However, because of their highly symmetric nature, they usually 
fail to properly model data in higher dimensions. Nested Archimedean copulas 
j^ (NACs), or hierarchical Archimedean copulas, are an interesting attempt to 



overcome this drawback. They were first introduced by Joe (1997) and have 



been studied many times since, see for instance McNeil (2008), Hofert (2010) 
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Hering, Hofert, Mai and Scherer|(|2010p,|Hofert and Maechler|(|2011|),|Hofert and 



Pham (2012) or Okhrin, Okhrin and Schmid (2013). 



The hierarchy of variables in a nested Archimedean copula is described 
through a rooted tree. Most often, the tree is taken as given from the con- 



text, for instance in Hofert (2010) or in Puzanova (2011). Okhrin, Okhrin and 



Schmid (2013) were the first to address the issue of reconstructing the tree from 



a sample, offering a parametric answer to the problem. In contrast, the method 
we propose is completely nonparametric and does not require the user to make 
any assumption on the NAG from which the tree structure must be estimated. 

Sections 2 and 3 of this paper introduce Archimedean copulas and a nested 
Archimedean copulas. The fourth section adds a condition on nested Archime- 
dean copulas to ensure the tree structure is always well identified. 

Section 5 shows a convenient way to store a NAG tree structure which will 
be used throughout the rest of the paper, while Section 6 introduces a key point, 
namely that a NAG structure can always be represented as a set of trivariate 
NAG structures. That is, for a random vector of continuous random variables 
{Xi, . . . , Xd) with a NAG as joint copula, it is possible to get the tree structure 
of this nested Archimedean copula provided the tree structure of the nested 
Archimedean copula associated with each triple of variables (Xi, Xj, Xk) with 
distinct i,j, k G {1, ... d} is known. 

Next, it is shown in Section 7 how the NAG structure of a triple of variables 
{Xi, XjjXj;) can be estimated nonparametrically. The idea is to estimate the 
Kendall distribution associated with each pair of variables within the triple, 
these estimates allowing us to decide if each pair of variables has actually the 
same underlying Kendall distribution or not. If yes, then the tree structure of 
the triple is the trivial trivariate structure. If not, determining which pair has a 
different underlying Kendall distribution allows to assign the correct NAG tree 
structure to the triple of variables. 

When the NAG tree structure of each of the ( 3 ) triples has been estimated, 
it may happen that a proper d-variate NAG structure cannot be retrieved. How 
to deal with this issue is described in Section 8. 

The performance of our approach is then assessed by means of a simulation 
study involving target structures in several dimensions. As part of this simu- 



lation study, some comparisons with the work of Okhrin, Okhrin and Schmid 



2012a 



(2013| are also performed using their R package HAC (Okhrin and Ristig 



Finally, an application section shows the usefulness of our method when 
applied on some financial data, and some remaining challenges are outlined in 
a discussion section. 



2. Archimedean copulas 

Let {Xi, . . . ,Xd) be a vector of continuous random variables. The unique 
joint copula of this vector is defined as 

C(ui,.. .,Ud) = P{Ui <ui,...,Ud< Ud) 

where {Ui,...,Ud) = {Fx, {Xi), ..., Fx^ {Xd)) and where Fx^ ,■■■, Fx^ are the 
marginal cumulative distribution functions, or CDFs in short, of Xi,...,Xd 
respectively. 

Archimedean copulas (AGs) are a class of copulas that admit the represen- 
tation 

C(ui, . . . , Urf) = V(V'^^(mi) + • • • + i^^^iud)) 
where ■0 is called the generator and ij}~^ is its generalized inverse, with -0 : 
[0, cxi) — ^ [1,0] ; ^(0) = 1 and ^{co) = 0. In order for C to be a copula, the 



generator is required to be d-monotone on [0, oo) (McNeil and Neslehova 2009): 



• {--ifi^^^^x) > for aU a; > 0; fc == 0, 1, . . . , d - 2 and where ^^^^ is the 
/cth derivative of ip{x)\ 

• {—l)'^^'^ip'^''-^'^^{x) is nonincreasing and convex. 

The generators in the following table are among the most popular ones. All 
of them are completely monotone, that is, d-monotone for all integer d > 2. 

Table 1: Some popular generators for Archimedean copulas. 

name generator ^(x) 6 r 

AMH (i-0)/(e^-0) ee[o, 1) i-2{e + (i-gfiog{i- 9)) /{se^} 

Clayton (l + x)"^/" S e (0, oo) 6/(9 + 2) 

Frank - log(l - (1 - e"'')e""=)/e 6 S (0, oo) 6/(6 + 2) 

Gumbel exp(-x^/'') ee[l,oo) (61 - l)/e 

Joe 1 - (1 ~ e-"")^'" ee[l,oo) 1 -4X;r=l 1/(^(9'= + 2)(e(/c- 1) +2)) 

The parameter 9 in Table [T] allows to control the strength of dependence 
between any two variables of the related Archimedean copula. This is best 
understood by expressing Kendall's r coefficient between any two variables of 



the related Archimedean copula in terms of 9 (Hofert and Maechler 2011), as 
done in the last column of the above table. 

All margins of the same dimension of an AC are equal. This is because 
C(ui, . . . ,Ud) is a symmetric function in its arguments in the case of Archi- 
medean copulas. For modeling purposes, this becomes an increasingly strong 
assumption as the dimension grows. 



3. Nested Archimedean copulas 

Asymmetries, allowing for more realistic dependencies, are obtained by plug- 



ging in Archimedean copulas into each other (Joe 1997). For instance, in the 



two-dimensional Archimedean copula 

the argument • can be replaced by another Archimedean copula, such as 

in order to get a copula of the form 

Cz5„(ui,Cr.23(M2,M3)) = V'o(V'Do(ul)+V'i5o(^/'^.23(^DL("2)+^DL("3))))• 
(3.1) 

This last equation describes a copula where the marginal bivariate distribu- 
tion of {U2, f/3) is not the same as the marginal bivariate distribution of {Ui, 
U2) or ([/i, f/3), provided the generators ijjdo and i/'£'23 ^■'^^ different. If the 
joint CDF of (C^i, C/2, C^s) was a simple Archimedean copula, all the marginal 
bivariate distributions would have been the same. This allows to appreciate 
how the symmetry inherent in Archimedean copulas can be broken, although 
some leftover symmetry always remains, as the marginal bivariate distributions 
of (t/i, U2) and (f/i, C/3) are the same. 

The way Archimedean copulas are nested corresponds to a rooted tree struc- 
ture, which will be referred to as the NAC tree structure or sometimes simply 



as the structure later. Nested Archimedean copulas, such as the one in (3.1), 
are defined through that rooted tree structure and through a collection of gen- 
erators, one for each branching node in the tree. If the only nodes in the tree 
are the root and the leaves, then the copula is an Archimedean one, that is, a 
nested Archimedean copula with trivial structure and only one generator. 

Definition 3.1. Let Dq be a nonempty, finite set with \Do\ — d elements. For 
concreteness, let Dq = {Ui, . . . , Ud}- Formally, a rooted tree structure A on Dq 
is a collection of nonempty subsets of Dq such that 

(i) Dq e X; 

(ii) {a} £ A for every a G Dq; 
(Hi) if A,B e A, then either A C B, B C A, or An B = 0. 

The elements of A are called the nodes of the structure. The element Dq of A 
is called the root node, or root in short; the singleton elements {a} of A are 
called the leaves. The nodes of A that are not leaves are called the branching 
nodes. If A,B G A are such that A G B, A ^ B , and there is no C Cz X such 
that A d C d B and C =/= A and C ^ B , then A is called a child of B and 
conversely B is called the parent of A. The set of children of B in X is denoted 
byC{B,X). 



For instance, the structure A implied by Equation ( |3.l[ ) is 
{{Uu U2, f/s}, {U2, f/3}, {Ui}, {U2}, {C/3}} 



and it can be graphically represented as shown in the picture below, where D23 
is a convenient label for the subset {C/2, C/3}. 



/ \ 
Ui D23 

/ \ 



Figure 1: The tree structure implied by Equation (3.1). To ease the notation 



the singletons {C/i}, {U2} and {f/3} are denoted by [/i, U2 and f/3. 

In this structure, {U2} and {U3} are the children of D23 while {Ui} and D23 
are the children of Dq, the root node. 

Let A be a rooted tree on Dq = {Ui, . . . , C/^}. Suppose that for each B G X 
with \B\ ^ 2 we are given an Archimedean generator ipB, that is, we are given 
a generator for each branching node in the structure. 

Define then recursively the function Cb '■ [0,1]'^' — > [0,1], with S e A, 
\B\ > 1, by 

jub iiB^{b} 

Definition 3.2. A d-variate copula Cjjo is a nested Archimedean copula (NAC) 



if it is of the form Cb in (3.2), with B — Dq. 

For any A C Dq with \A\ > 2, the copula Ca on the variables {ua : a & A) 
turns out to be a nested Archimedean copula too. To describe its structure and 
its generators, we need a few more definitions. 

Let A be a NAC structure on Dq and let T be a nonempty subset of D^. The 
set T need not be a node of A. The NAC structure A induces a NAC structure 
on T by the following operation: 

XnT^{Tr\B : B eX}\{0}. 

That is, A n T is obtained by intersecting every node i? of A with T. Some of 
these intersections will be empty, and they are removed. Different nodes Bi and 
B2 of A may have identical intersections BiOT and B2 H T with T; since A n T is 
the collection of all intersections, identical intersections are counted only once. 

It is easy to verify that this construction produces a tree structure on T: 



verification of (i), (ii), and (iii) in Definition 3.1 is immediate. 

Let now T be a subset of Dq containing at least two elements, that is \T\ > 2. 
T does not need to be a node of A. The smallest com,mon ancestor (sea) of the 



elements of T is given by the intersection of all the nodes i3 in A that contain 
T, that is, 

sca(r, A) = n -^ 

SeA:TCS 

and it provides the smallest branching node through which the elements of T are 
linked up. For instance, looking back at Figure [l] one can see that the smallest 
common ancestor between U2 and C/3 is 1)23, while sca({C/i, C/2}, A) = Dq and 
sca({f/i,C/3},A) = i?o. 

Let Cdo be a d-variate nested Archimedean copula and let Ahe a, nonempty 
subset of Do, not necessarily a node in the tree A. The marginal copula Ca on 
the variables in A is a nested Archimedean copula too. Its NAC structure is 
given hy Xr\ A, and the generator function associated to a branching node T in 
A n ^ is given by V'sca(T,A)- 



As appealing as it is. Definition 3.2 is unfortunately not sufficient to guar- 
antee that Cdq and its margins are copulas. A sufficient but not necessary 
condition was developed by Joe (1997) and McNeil (2008): the derivatives of 



ipj o tpj are required to be completely monotone for every pair of branching 
nodes / and J in the NAC structure such that J is a child of I. As an example. 



a sufficient condition for Cdq in Equation (3.1) to be a proper copula is that 
the derivatives of ipj)^ o i/ju^g are completely monotone. Although this sufficient 
nesting condition was originally formulated only in the context of fully nested 
Archimedean copula structures, that is structures where each branching node 
has either two leaves as children, or one leave and another branching node, we 
assume this sufficient nesting condition to hold for any NAC structure. 

The sufficient nesting condition is often easily verified if all generators ap- 
pearing in the nested structure come from the same parametric family. For each 
family of Table [1} two generators tjji and ipj of the same family with correspond- 
ing parameters 9i and 9j will fulfill the sufficient nesting condition if Oj < 9j, 
assuming J is the child of /. Verifying the sufficient nesting condition if ipi and 
ipj do not belong to the same Archimedean family is usually harder, see for 



instance Hofert (2010) 



4. Identifiability 

Recall that a parameter 9 (possibly infinite-dimensional) in a statistical 
model {Pg : 9 ^ Q), with Pg a probability measure on a fixed space, is identifiable 
if di 7^ ^2 implies that Pg-^ ^ Pg^, that is, different parameters yield different 
distributions of the observable. For d-variate nested Archimedean copulas, the 
parameter 9 consists of the pair 

(A,{7/-B:BeA,|B|>2}). 

In this parametrization, the parameter 9 is not identifiable, since replacing 
a generator function 4'b{x) by the function ijjg^ax), with < a < 00, yields the 



same copula; that is, the generator functions are identifiable up to scaling only. 
This issue can be solved easily in different ways, for instance by requiring that 
V'b(I) = 1/2. 

However, a more fundamental identifiability issue arises if some generator 
functions are not different. Consider for instance the tree A implied by Equation 
(3.1 1, shown in Figure n] If the generators ipoo ^-nd 4'D23 &re the same, say tp, 



then the nested Archimedean copula with parameter (A; V'Doi ^^023) is 

Cdo(ui,Cd,3(w2,U3)) = V'o(V'Do(ui) + V'Do(V'D2,,(V'd23("2) + V'd23("3)))) 

and actually describes an Archimedean copula with generator xp, that is, a 
nested Archimedean copula with trivial tree structure and single generator xp. 

To avoid such identifiability issue we must require that for any two nodes 
A and B such that A C B and A ^ B, meaning A is a descendant of B or 
conversely B is an ancestor of A, the bivariate Archimedean copulas generated 
by the generator functions ipA and ipB are different. If this holds, then the 
structure A and the generators {ipB ■ B S A, |i?| ^ 2} can be identified (up to 
scaling) from a nested Archimedean copula Cdo ■ 

Note that some generator functions can still be identical. Consider for in- 
stance the structure in Figure [2] 

Do 

/ \ 
D12 D34 

A A 

Ui U2 U3 U4 

Figure 2: D12 is a convenient label for {1/1,1/2}, as well as D34 for the sub- 
set {1/3,1/4}. Again, we ease the notation by writing Ui,...,Ud instead of 
{Ui},...,{Ud} for the singletons. 

The generators associated to the nodes D12 and 1)34 can be identical, without 
simplification of the tree being possible. 

Also note the implication of this identifiability condition on the sufficient 
nesting condition if all generators appearing in the nested structure come from 
the same parametric family. For each family of Table [Tj two generators ■0/ and 
ipj of the same family with corresponding parameters Oj and 9j will fulfill the 
sufficient nesting condition and the identifiability condition if 9i is stricly less 
than 9 J, assuming J is a child of /. 



5. The smallest common ancestor matrix 

Let A be a NAC structure on Dq. Let A, B E X he two distinct nodes. Recall 
that if j4 C -B and there is no C e A such that A C C C B and C ^ A and 
C ^ B, then A is called a child of B and conversely B is called the parent of A. 

The set of all children of a branching node B forms a partition of B, that 
is, taking the union of all children of a branching node B allows to reconstruct 
that branching node. As a consequence, every branching node has at least two 
children. 

Also recall that if T is a subset of Dq containing at least two elements, 
then the smallest common ancestor (sea) of the elements of T is given by the 
intersection of all the nodes _B in A that contain T, that is, 

sca(r. A) = n -^ 

Be\:T<ZB 

and it provides the smallest branching node through which the elements of T 
are linked up. 

Since the children of a branching node B form a partition of B and since 
each branching node has at least two children, it follows that each branching 
node can be reconstructed from the pairs of which it is the smallest common 
ancestor, that is, for every branching node B, we have 

B = U{{C/„ U,} cDo:U,^ Uj, sca({[/„ [/,}, X) = B}. (5.1) 

The relation ". . . has the same smallest common ancestor as ... " is an equiva- 
lence relation on the set of pairs {Ui, Uj} of Dq. This relation induces a partition 
of the set of pairs into equivalence classes: two pairs {Ui, Uj} and {Uk,Ui} be- 
long to the same equivalence class if and only if they are related, that is, if and 
only if they have the same smallest common ancestor in A. 



By Equation (5.1), the NAC structure A can be reconstructed from the 
equivalence relation it induces on the set of pairs: every equivalence class of 
pairs corresponds to a branching node, the branching node being given by the 
union of the pairs in that equivalence class. Put differently, the union of all 
pairs within an equivalence class yields the branching node that is the smallest 
common ancestor for each pair in that equivalence class. Hence, every NAC 
structure A on Dq can be represented as a partition on the set of pairs of Dq 
and the structure can be recovered from that partition. 

A convenient way to display the equivalence classes is by the smallest com- 
mon ancestor matrix or in short sea matrix: it is a d-hy-d symmetric matrix 
containing the elements of Dq in the rows and columns and whose element (i, j), 
with i ^ J, is the label of the node, in A, which is the smallest common ancestor 
associated with the equivalence class to which the pair {Ui,Uj} belongs. Put 
more simply: the element (i,j), with i ^ j, of the sea matrix is the name of 
the node in A which is the smallest common ancestor of the pair {C/j, C/j}. Note 



that although the labels (the names) of the nodes of a given structure A are 
arbitrary as long as they are different, we will always give the name Dq to the 
root node in this paper, while a label such as D2379 means the related node is 
{U2,U3,UT,Ug}. The diagonal of the smallest common ancestor matrix always 
remains empty. 

As an example, the sea matrices for Figure[l]and for a trivariate Archimedean 
copula are given hereafter: 



ihj) 


f/l 


U2 


Us 


f/l 




Do 


Do 


f/2 


^0 




D23 


U'i 


Do 


D23 





ihj) 


f/l 


f/2 


f/3 


C/i 




Do 


1^0 


C/2 


^0 




Do 


U-i 


^0 


Do 





When jDol = 3, there are only four possible NAC structures fulfilling Defi- 
nition 13.11 

{{f/l, f/2, f/3}, {f/l}, {C/2}, {t/3}} = Structure A123; 

{{f/l, f/2, f/3}, {f/2, f/3}, {f/l}, {f/2}, {t/3}} = structure ; A23 

{{f/l, f/2, f/3}, {f/l, f/2}, {f/l}, {f/2}, {f/3}} = structure ; A12 

{{f/l, f/2, f/3}, {f/l, f/3}, {f/l}, {f/2}, {f/3}} = structure .Ai3 

The sea matrix given on the left-hand side above corresponds to A23, while the 
sea matrix given on the right-hand side above corresponds to A123, the trivial 
trivariate structure. As there are only four possible structures when |_Do| = 3, 
there are also only four 3-by-3 genuine sea matrices. 

6. SufRciency of structures on triples 

Let A be a NAC structure on a finite set Dq = {C/i, ..., Ud}, d> 4. Suppose 
that for every triple Kijk = {Ui,Uj,Uk} with distinct i,j,k e {l,...,d}, the 
3x3 sea matrix of A n Kijk, the tree spanned on {Ui, Uj, Uk}, is known. The 
set of 3 X 3 sea matrices built this way includes a total of ( !^ ) sea matrices and 
will be referred to as ^(A). 

In Proposition |6.H it is shown that the NAC structure A can be recovered 
from ^(A). Lemmas 111 to p^ contain some auxiliary results. 

Lemma 1. Let X be a NAC structure on Dq = {Ui, ..., Ud}- For ^ T <Z C C 
Dq, we have 

sca(r, A n C) = sca(T, A) n C. 

Proof. By definition, we have 

sca(r,A)nc=( Pi sjnc= p| (fine). 

^BeX:T<ZB ' BeX:T<ZB 



Since T is a subset of C and since T must be a subset of B, notice that requiring 
T G B \s equivalent in requiring T G B n C. Thus we can write 



sca(r,A)nc= Pi (Bnc). 



Bex-.TCBnC 
On the other hand, 



sca(r,AnC')= Pi B'. 



B'e\nC:TCB' 

Since A n C = {B n C : B E X}\ {0} by definition, we can rewrite the above 
expression as 

sca(T,AnC)= P (BnC). 

BeX:TCBnc,Bnc^0 

And because T C BOC and T 7^ 0, the requirement BOC ^ can be dropped, 
thus 

sca(r, A n c) = p (BnC)^ sca(r. A) n C. 
Bex.TCBnc 

D 

Lemma 2. Let X be a tree on Dq. For any nonempty subsets Ti,T2,C of Dq 
such that Ti U T2 C C, we have 

sca(ri,A) =sca(T2,A) 
■^=^ sca(ri, A n C) = sca(T2, A n C). 

Proof. By Lemma [Tl we have 

sca(T,-, A n C) = sca(rj. A) n C with j = 1, 2. 

Suppose first sca(Ti, A) — sca(T2, A). We therefore have 

sca(ri, A n C) = sca(Ti, A) n C 
= sca(T2,A)nC 
= sca(T2,AnC). 

On the other hand, suppose that sca(ri, A fl C) = sca(T2, A H C). Obviously, 
sca(ri,A) D sca(Ti,A)nC 
and since T2 is both a subset of sca(r2, A) and of C, we also have 

sca(T2,A)nC dTs. 

10 



Because sca(Ti, A n C) = sca(r2, A n C) implying by Lemma [I] that sca(Ti, A) n 
C = sca(T2, A) n C, we have 

sca(ri,A)Dr2, 

which means that sca(Ti, A) is an ancestor of T2, but not necessarily the small- 
est. Therefore sca(Ti,A) D sca(r2,A). The converse inclusion holds as well, 
by symmetry of the argument. We conclude that the two sets sca(Ti, A) and 
sca(T2, A) are in fact equal. D 

Lemma 3. Let X be a tree on Dq and let A dL A. Let B be a nonempty subset of 
Dq with a least two elements. The smallest common ancestor of B is equal to 
A if and only if B <Z A and there exist distinct children Bi and B2 of A such 
that Bn Bi^ and B D B2 ^ ■ 

Proof. Suppose first that A is the smallest common ancestor of B. Clearly 
B C A. Let Bi, . . . ,Bp be the children of A and recall these children form a 
partition of ^. Hence B = BnA = [J^^iiBCiBj), and thus at least one of these 
intersections is not empty. However, if only one of these intersections would be 
nonempty, say B r\ Bi, then we would get B = B Ci Bi and thus B C Bi, 
meaning that Bi is also common ancestor of all elements of B. Since Bi is a 
proper subset of A, this would be in contradiction with the assumption that A 
is the smallest common ancestor of B. Therefore if A is the sea of B, B has a 
nonempty intersection with a least two children of A. 

Conversely, suppose that B <Z A and that there exist distinct children Bi 
and B2 of A having nonempty intersections with B. Let A' be a node in A such 
that B C A' . Then also i? n Bi C A' , and thus, as Br\Bi is nonempty, A' n Bi 
is not empty. Similarly, A' n B2 is not empty. Since Bi and B2 are disjoint. 



requirement (iii) in Definition 3.1 then forces Bi and B2 to be descendants of 
A' . As a consequence A G A' . We have obtained that A is included in every 
node A' containing _B as a subset. We conclude that A is the smallest common 
ancestor of the elements of B, as required. D 

Proposition 6.1. The NAC structure A can be recovered from the set ■^(A), that 
is, it is possible to retrieve the partition of the set of pairs {Ui, Uj} of Dq into 
equivalence classes from the set "^(A). 

Proof. If two distinct pairs have an element in common, their union is a triple. 
The equivalence of the two pairs can then be decided from that triple. Indeed, 
let {Ui, Uj} and {Ui, Uk} be two pairs with exactly one element, Ui, in common. 
To see whether they have the same smallest common ancestor in A, it is sufficient 
to consider the tree induced by A on the triple {Ui, Uj, Uk}'. it is known from 
Lemma [2] that the pairs {Ui, Uj} and {Ut, Uk} have the same smallest common 
ancestor in A if and only if they have the same smallest common ancestor in 

xn{u„u,,Uk}. 

However, if two pairs are disjoint, there is no triple containing both pairs. 
Still, considering triples turns out to be sufficient to verify their equivalence: 



11 



the two pairs can only be equivalent if there is a third pair equivalent to both of 
them and having a non-empty intersection with each of them. Indeed suppose 
there exists a pair {Ui,Uj} having the same smallest common ancestor as the 
pair {Ui,Uk}- Also suppose {Ui,Uk} has the same smallest common ancestor 
as {Uk,Ui}. Then by transitivity {Ui,Uj} has the same smallest common an- 
cestor as {Uk,Ui}. Conversely, suppose that {Uk,Ui} and {Ui,Uj} have the 
same smallest common ancestor, A. Recall Lemma [Sj Let Bi,Bj,Bk,Bi be 
the children of A to which Ui,Uj,Uk,Ui belong, respectively. We must have 
Bi n Bj = and Bk H Bi = 0. Then Bk and Bi cannot both be equal to Bi. 

• If Bh is different from Bi , then Ui and Uk belong to two different children 
of A, and the smallest common ancestor oi {Ui,Uk} ^s A too; 

• If Bi is different from Bi, then, similarly, the smallest common ancestor 
of {Ui,Ui} is A too. 

In both cases, we have found a pair that is equivalent to {Ui, Uj} and {Uk, U} 
and that has a nonempty intersection with each of them. D 

Hereafter is a practical example on how to retrieve A from ■^(A) for the case 
d = 4, (t)=4. 

Suppose the 3x3 sea matrices are: 





f/l 


f/2 


Ui 




f/l 


t/2 


Ui 


f/l 




H 


I 


f/l 




K 


L 


f/2 


H 




I 


f/2 


K 




L 


t/3 


I 


I 




Ui 


L 


L 






f/l 


Ui 


Ui 




f/2 


Ui 


Ui 


t/l 




M 


M 


f/2 




Q 


Q 


Us 


M 




N 


Ui 


Q 




R 


Ui 


M 


N 




Ui 


Q 


R 





From this, we get that 

• The smallest common ancestors of the pair {Ui, U2] are {H,K{; 

• The smallest common ancestors of the pair {Ui, C/3} are {/, M}; 

• The smallest common ancestors of the pair {Ui, C/4} are {L, M}; 

• The smallest common ancestors of the pair {t/2, f/3} are {IiQ}] 

• The smallest common ancestors of the pair {U2, C/4} are {L, Q{; 

• The smallest common ancestors of the pair {L/3, C/4} are {N, R]. 

It appears therefore that {C/i, C/3}, {C/i, f/4}, {C/2, C/3} and {U2,Ui} belong 
to the same equivalence class, while {Ui,U2} is all alone, as well as {C/3, C/4}. 
The branching nodes of A in this case are therefore {C/i, C/2, C/3, C/4}, {C/i, C/2} 
and {C/3, C/4}. The rooted tree structure A is thus as shown in Figure [2J 

The general procedure for any d > 4 is: 
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1. Establish a list of all possible pairs {Ui,Uj}, i < j; 

2. For each pair, get from '^(A) the set of smallest common ancestors. Each 
pair should appear in d — 2 trivariate sea matrices and thus d — 2 smallest 
common ancestors should be available for each pair; 

3. Intersect the set of smallest common ancestors of each pair with the sets 
of the other pairs. Any nonempty intersection means the two pairs are 
related, that is, belong to the same equivalence class; 

4. Take the union of all pairs within each equivalence class to get the branch- 
ing nodes of the structure; 

5. Add the leaves to get A. 

7. Nonparametric estimation of a trivariate NAC structure 

Let (Xi, X2, X^) be a vector of continuous random variables such that the 
joint distribution of (C^i, C/2, C^a) ~ {Fxi{Xi),Fx2iX2),Fx-i{X3)) is a nested 
Archimedean copula, and where Fx^ , Fx^ and Fx^ are the marginal CDFs of 
{Xi,X2,X3). We are interested in estimating the NAC structure based on n 
observations {xii,xi2,xi^) from {Xi,X2.,X^), I = l,...,n. 

Remember there are only four NAC structures possible for the trivariate 
case, as outlined at the end of Section [SJ With the trivial structure (structure 
A123), all marginal bivariate distributions of the NAC are the same while in 
structures A23, A12 and A13, two marginal bivariate distributions are the same 
and one is different. Moreover if the marginal bivariate distributions are not all 
the same, being able to determine the one that is different from the two others 
is enough to determine whether the NAC structure is structure A23, A12 or A13. 



It is known from Genest and Rivest (19931 that the Kendall distribution of 



a pair of variables {Xj,X]^) fully determines the copula of that pair if it is an 
Archimedean copula. Thus, rather than working directly with bivariate distri- 
butions, let us work with the related Kendall distributions which are univariate 
and therefore easier to handle. The Kendall distribution of the pair (A"j, Xk) is 
defined as the distribution of the variable 

Wjk = Cjk{Uj, Uk) = Hjk{Xj,Xk) 

where Cjk{uj,Uk) = P{Uj < Uj,Uk < Wfc) is the joint CDF of {Uj,Uk) and 
where Hjk{xj,Xk) — P{Xj < Xj,Xk < Xk) is the joint CDF of {Xj,Xk)- The 
map defined, for all w G [0, 1], by 

Kjkiw) = PiWjk < w) 
is the Kendall distribution function (Barbe et al. [1996 Nelsen et al. 2003 



Genest and Rivest||2001 1. 



The Kendall distribution function of a pair of variables (Xj,Xk) can be 



estimated (Genest, Neslehova and Ziegel 2011) by first computing its pscudo 
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observations wijk, ■ ■ ■ , Wn,]k and then by computing the empirical distribution 
function of these pseudo-observations: 



^-■'1 = 1 



1 

1 = 1 

1 " 

Fn,]k{x) = - ^ l{wm,jk < x), with < a; < 1. 

7n— 1 

Since there are three possible pairs in our case, namely (Xi,X2), {Xi,X^) 
and {X2,X^), three empirical Kendall distribution functions can be estimated. 
A distance between the empirical Kendall distribution functions of {Xi, Xj) and 
{Xi,Xk) is defined as 

.1 ^ n 

/ \Fn,tj{x) - Fn.ik{x)\ dx = - V |W(„)^jj - W(™).ifc| = (5.y,zfc 

where W(i)^ij, ...,W(^n),ij are the ordered pseudo-observations of the Kendall dis- 
tribution related to the variables {Xi,Xj) and W(i),i/c7 ■■■,W(^n),ik are the ordered 
pseudo-observations of the Kendall distribution related to the variables {Xi, X^). 

Typically, a trivial structure will result in three distances that are all about 
the same, while structures such as A12, A13 or A23 will result in one small distance 
relative to two other distances that are bigger and about the same. Thus for any 
triple of variables (X^, Xj, X^), if, for instance, Sij^ik is the minimum among the 
three distances, it seems reasonable to assume that either the structure of the 
triple is the trivial structure or the structure Xjk where {Xi,Xj) and (Xi.Xk) 
have the same Kendall distribution. 

The problem of determining the structure of (Xi, X2, X^) can be rewritten 
as an hypothesis test: 

Ho : the true structure is the trivial structure. 

Hi : the true structure is structure A12 or A13 or A23, depending on what 
was the minimum observed distance. 

As a test statistic, the absolute difference between the minimum distance 
and the average of the two remaining distances is used. The null hypothesis is 
rejected when the test statistic is observed in the upper tail of its Hq distribution. 

Unfortunately, the Hq distribution of the test statistic is unknown. Under Hq 
the original sample is assumed to come from an unknow trivariate Archimedean 



copula. Using the work of Genest et al. (20111, it is possible to estimate that 



Archimedean copula nonparametrically and to resample from that estimated 



AC, see Genest et al. (2011) for more details. For each new sample, the three 
empirical Kendall distributions, the three distances, and the related test statistic 
are to be computed. The p-value of the observed test statistic is then estimated 
by the proportion of test statistics obtained from the new samples that are 

14 



greater than or equal to the value of the observed test statistic obtained from 
the original sample. Should this estimated p-value be lower than or equal to a 
threshold a, for instance 10%, the null hypothesis is to be rejected. 

Note that the estimator for the Kendall distribution depends on the data 
only through the ranks and since our hypothesis test depends on this estimator, 
the resulting NAC structure estimator we developed here is rank-based too. 

There are two key points in the test presented above: 

• First, determine what should be the alternative hypothesis. Should it be 
structure A12, A13 or A23? 

• Second, choose between a trivial structure {— H^) and Hi. 

Possible errors are: 

• If the true structure is the trivial structure, rejecting it and therefore 
committing a type I error; 

• If the true structure is structure A12, A13 or A23, failing to reject Hq (type 
II error); 

• If the true structure is for instance structure A12, getting a wrong Hi and 
then picking it (we will call this a type III error). 

The main difficulty with the test developed in this section is encountered 
when the true structure is the trivial structure, that is, the structure one gets 
when the nested Archimedean copula is actually a simple Archimedean copula. 
Indeed if the probability of committing a type I error is fixed to a = 0.10, the 
trivial structure will be rejected 10% of the time regardless the input sample size 
n. Our estimator is therefore not a consistent estimator for the trivial trivariate 
structure, unless we let a tend to as n increases, a key point if one hopes to 
achieve consistency for any trivariate NAC structure, including the trivial one. 



8. Reconstruction of a NAC structure based on a set of estimated 
trivariate structures 

Let A be a NAC structure on a finite set D — {Ui, ...,Ud},d > 4. It is 
known from Section ro^ that if for every triple Kijk = {Ui, Uj, Uk} with different 
*jJifc G {li-'-i'^} the 3x3 sea matrix of A H Kijk is known, then A can be 
recovered from that set of 3 x 3 sea matrices, this set of sea matrices being 
referred to as ^(A). 

However if each of the 3x3 sea matrices are estimated, the problem is a 
bit different. Indeed it is not guaranteed that a proper NAC structure can be 
recovered from a given set of estimated 3x3 sea matrices. When the global esti- 
mated structure A retrieved from ^(A) is not a genuine NAC structure, meaning 



it does not fulfill Definition 3.1 we call ^(A) a faulty set. 
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With a value of a equal to 0.00 for all tests, we fail to reject the null hy- 
pothesis everywhere and we therefore get a set of estimated sea matrices each 
describing a trivial trivariate structure. Such a set is never a faulty set, and A, 
the estimated global NAC structure retrieved from it, will always be a trivial 
structure of dimension d. Of course if the true structure is not a trivial structure 
of dimension d, a value of a equal to 0.00 means you are sure to commit type 
II errors. 

With a value of a equal to 1.00 for all tests, all null hypotheses are rejected 
and we end up with a set where each 3x3 estimated sea matrix describes a 
non-trivial trivariate structure. Such a set can be a faulty set and usually is. 

Assuming the copula of the vector (Xi,...,Xd) is a NAC, a faulty set of 
estimated trivariate structures means at least one error (type I, type II or type 
III) has been committed. Notice the converse is not true: even when at least 
one type I, type II or type III error has been committed, the set of estimated 
trivariate structures might lead to a global estimated NAC structure meeting 
Definition 13.11 

How to properly deal with a faulty set remains an open problem. As done in 
the simulation study, we simply suggest in such case to decrease the value of a 
for all tests till the resulting set of estimated trivariate structure is not a faulty 
set anymore. At worst, a is to be decreased down to 0.00, and we end up with 
a set of trivial trivariate structures. The global predicted structure is then the 
trivial structure of dimension d. 



9. Simulation study 

9.1. Testing the method with samples from a trivial trivariate structure 

Let (C/i, C/2, C^s) be a vector of random variables having an Archimedean 
copula as joint distribution. We g enerate 500 samples of size n from this vector 
using the R package nacopulap] (JHofert and Maechler 2011). With a = 0.10, 



how many times among the 500 samples are we able to retrieve the true struc- 
ture? Figure |3] shows the percentage of correct predictions for various values of 
n, various generator families and two different values of the related parameter 
9, expressed as Kendall's r coefficient for convenience according to Table [I] 



The nacopula package has since been merged with the copula package. 
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Figure 3: Percentage of correct predictions when tlie true structure is the trivial 
trivariate structure. 



As expected, the percentage of correct predictions does not converge to- 
wards 100% but simply oscillates around 90%. In order for our estimator to 
be consistent for a trivial trivariate structure and therefore for any larger NAC 
structure that has at least one trivariate component equal to the trivial trivari- 
ate structure, we have to let a tend to as n increases to ensure type I errors 
are asymptotically impossible. 



To apply the method from Okhrin et al. (2013), we use the function es- 



timate. copula of the R package HAC. As done in the simulation section of 



Okhrin and Ristig (2012b), we set epsilon to 0.15 for the aggregation step 
and use the default aggregation method. Since only the Clayton and Gumbel 
generator families are currently implemented in the HAC package and since the 



estimator from Okhrin et al. ( 2013 ) requires the knowledge of the generator fam- 



ily prior to the estimation of the structure, no comparison with the performances 
of our estimator for other families is possible at the time of writing. 
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Figure 4: Performances of the estimator developed in Okhrin et al. (2013), with 
e = 0.15 and default aggregation method. 



Increasing the value of e improves the performances of their estimator in the 
case of a trivariate trivial structure, but decreases the performances of the same 
estimator when the target structure is a non-trivial trivariate structure. Later 
in this section, a seven-variate structure made up only of non-trivial trivariate 
structures is tested. For this last structure, a value of e = 0.15 is actually already 
too high and lead to poor performances of their estimator, thus preventing us 
from using a higher value of e here (the same value of e for their estimator is 
used throughout the simulation study, as well as the same value of a for our 
approach) . 

9.2. Testing the method with samples from, a non-trivial trivariate structure 

Given 500 samples of size n from a non-trivial trivariate structure, such as 
the one in Figure [Tl and a = 0.10, how many times among the 500 samples 
are we able to retrieve this non-trivial trivariate structure? Figure [5] shows the 
percentage of correct predictions for various values of n and various generator 
families. Note the same generator family is always used across all nodes of a 
given structure in the simulation section of this paper. The parameters ^o (root 
node, -Do) and 623 (the other branching node, D23) are expressed as Kendall's 
T coefficients for convenience: 
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Figure 5: Percentage of correct predictions when the true structure is structure 

A23- 



As the sample size increases, there is a clear convergence towards 100% of 
correct predictions. The more apart tq and T23, the faster the convergence 
towards 100% of correct predictions (compare the two horizontal axes above). 
These results strongly suggest our estimator is a consistent estimator for any 
non-trivial trivariate NAC structure and thus for any larger NAC structure made 
up only of non-trivial trivariate structures. 

Using the method from Okhrin et al. ( |2013 ) as we did in the previous subsec- 
tion (e = 0.15, default aggregation method), we found out that the percentage 
of correct predictions also converges towards 100%, but at a much faster rate: 
their estimator clearly outperforms our estimator this time. The rate of conver- 
gence can be improved further by lowering the value of e. However, recall that 
their estimator uses the knowledge of the generator family. 



9.3. Testing the method with a four-variate structure 

Suppose 500 samples of size n are generated from the structure below on 
the left-hand part of Figure [6) with tq = 0.3 and T34 — 0.7. With a fixed value 
of a = 0.10, how many times are we able to retrieve this four-variate structure 
among the 500 samples? The right-hand part of Figure |6] shows the percentage 
of correct predictions for various values of n and various generator families: 
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Figure 6: Percentage of correct predictions for a four-variate case. 



The percentage of correct predictions eventually oscillates around 97%. There 
is no convergence towards 100%, which was expected for this structure since two 
of its trivariate components are trivial trivariate structures. Remark: in case 
the global predicted structure is not a genuine NAC structure, the value of a is 
decreased till the global predicted structure becomes a genuine NAC structure, 
possibly a trivial four-variate structure, refer to Section 8 for more details. 

Using the method from Okhrin et al. ( |2013 ) as we did in the previous subsec- 
tions (e = 0.15, default aggregation method), we found out that our estimator 
outperforms their estimator by a large amount. For instance, with a sample size 
n = 75, the number of correct predictions is 20% for the Clayton family and 
50% for the Gumbel family versus 97% for all families tested with our estimator. 



9.4- A seven-variate case 

The performance of our method for a larger structure will be assessed by 
generating 500 samples of size n from the structure on the left-hand part of 
Figure [Tj with tq = 0.1, T123 = 0.3, T23 = 0.6, T4567 = 0.3, t^qj = 0.5 and 
T67 = 0.8. With a value of a = 0.10, how many times are we able to retrieve 
this seven-variate structure among the 500 samples? The right-hand part of 
Figure [7] shows the percentage of correct predictions for various values of n and 
three generator families: 



20 



T„ = 0.1 , T,23 = 0.3, t23 = 0.6, T45S7 = 0.3, Xsej = 0.5, Ts; = O.E 



Do 

/ \ 

Di23 D4567 

Ui D23 U4 D567 
U2 U3 Us D67 

Ue U7 




200 400 600 800 1000 1200 
n 



Figure 7: Percentage of correct predictions for a seven-variate case. 



A convergence towards 100% of correct predictions can be observed. As 
there are no trivariate components equal to the trivial trivariate structure in 
the global structure, this was expected. As in the previous subsection, faulty 
structures are handled by decreasing a untill a valid structure emerges. 

Using the method from Okhrin et al. ( |2013 1 as we did in the previous subsec- 
tions (e = 0.15, default aggregation method), we found out that our estimator 
outperforms their estimator by a large amount. For instance, with a sample size 
n = 1200, the number of correct predictions of the Clayton family is barely 40% 
versus 100% for our estimator, and the convergence towards 100% of correct 
predictions is very slow. Lowering the value of e however can help, for instance 
with a value of e = (the smallest allowed) , the number of correct predictions 
is 96% with a sample size as low as n = 200. However a value of e equal to for 
their estimator means that no aggregation is done anymore, making their esti- 
mator biased for many structures, for instance any trivial structure of dimension 
d. 



10. Application 

Daily log returns from January 2010 to December 2012 of 

• Abercrombie & Fitch Co. (ANF), traded in New York, 

• Amazon.com Inc. (AMZN), traded in New York, 

• China Mobile Limited (ChM), traded in Hong Kong, 

• PetroChina (PCh), traded in Hong Kong, 

• Croupe Bruxelles Lambert (CBLB), traded in Brussels, 

• and KBC Croup (KBC), traded in Brussels, 
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were gathered with the help of Yahoo! Finance (n — 700 observations, d = 
6). Figure [8] shows the estimated structure for ANF, AMZN, ChM and PCh, 
the estimated structure for ANF, AMZN, GBLB and KBC, and the estimated 
structure for ChM, PCh, GBLB and KBC. 





ANF AMZN ChM PCh 



1 r 

ANF AMZN GBLB KBC 




ChM PCh GBLB KBC 



Figure 8: Given two log returns from one geographical area and two from another 
area, a natural clustering by area arises. The above structures are all strongly 
supported by the data, as the 12 related p-values are less than lOe-04. 

In order to build a six-variate structure, we need to estimate the structure 
of eight extra triples. The left-hand panel of Figure [9] shows a reasonable guess 
for the six-variate structure in which the eight extra triples all have a trivial 
trivariate structure. 
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Figure 9: Possible six-variate structures for the data. 

However, the trivial trivariate structure in four of the eight extra triples is 
strongly rejected by the data and suggest the structure in the right-hand of Fig- 
ure[9J Unfortunately, this last structure implies we must reject the trivial trivari- 
ate structure for all eight extra triples and not only for half of them, making 
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the prediction of a six-variate structure quite difficult. Since both PetroChina 
and China Mobile are traded not only in Hong Kong but also in New York, 
we could expect their log returns in Hong Kong to be more related to the log 
returns of some companies in New York (for instance ANF and AMZN) than to 
the log returns of two companies in Belgium. The structure on the right-hand 
of Figure [9] seems therefore more appropriate. 

11. Discussion 

In this paper, we have paved the way for a nonparametric rank-based ap- 
proach to estimate a NAC structure, without any knowledge about the nested 
Archimedean copula prior to the estimation of its structure being necessary. A 
number of challenges however remain: 

• Difficulties can appear when the method is applied to real data for which 
the true copula is not necessarily a NAC. For instance, one can end up 
with a subset of estimated non-trivial trivariate structures each strongly 
supported by the data (that is, very small p-values, meaning type I or 
type HI errors are unlikely) and yet these non-trivial trivariate structures 
contradict each other in the sense that no global structure can be retrieved. 



• 



Assuming the true copula of a sample is a NAC, being able to cope with 
a faulty set of estimated trivariate structures in a different way than the 
one suggested at the end of Section |8] and applied in the simulation section 
might result in better performances of the estimator, especially for small 
samples. 

The whole method is computationally intensive, unlike the method from 



Okhrin et al. (2013). This is best understood by calculating the number 
of triples for which a test is necessary with our approach: with d = 10, 
we indeed have to estimate 120 trivariate structures. With d = 20, this 
number increases to 1140. An optimized R code is available from the 
authors. 

• Given an input sample of size n x d, it is unclear how to determine what 
should be the optimal value for a. Asymptotically, one should have a„ — > 
if one hopes to have a consistent estimator for any NAC structure. 

• Once a genuine NAC structure has been estimated, the problem of esti- 
mation of the generators remains. These generators cannot be estimated 
marginally, as doing so does not guarantee that the resulting function will 
be a proper copula. 
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